Search CORE

23 research outputs found

Supplementary Notes for Graph Theory 1:Including solutions for selected weekly exercises

Author: Wind David Kofoed
Wind David Kofoed
Publication venue: Technical University of Denmark
Publication date: 01/01/2014
Field of study

Model Selection in Data Analysis Competitions

Author: Wind David Kofoed
Winther Ole
Publication venue
Publication date: 01/01/2014
Field of study

Abstract. The use of data analysis competitions for selecting the most appropriate model for a problem is a recent innovation in the field of predictive machine learning. Two of the most well-known examples of this trend was the Netflix Competition and recently the competitions hosted on the online platform Kaggle. In this paper, we will state and try to verify a set of qualitative hypotheses about predictive modelling, both in general and in the scope of data analysis competitions. To verify our hypotheses we will look at previous competitions and their outcomes, use qualitative interviews with top performers from Kaggle and use previous personal experiences from competing in Kaggle competitions. The stated hypotheses about feature engineering, ensembling, overfitting, model complexity and evaluation metrics give indications and guidelines on how to select a proper model for performing well in a competition on Kaggle.

CiteSeerX

Online Research Database In Technology

Inferring Person-to-person Proximity Using WiFi Signals

Author: Lehmann Sune
Leskovec Jure
Sapiezynski Piotr
Stopczynski Arkadiusz
Wind David Kofoed
Publication venue
Publication date: 15/10/2016
Field of study

Today's societies are enveloped in an ever-growing telecommunication infrastructure. This infrastructure offers important opportunities for sensing and recording a multitude of human behaviors. Human mobility patterns are a prominent example of such a behavior which has been studied based on cell phone towers, Bluetooth beacons, and WiFi networks as proxies for location. However, while mobility is an important aspect of human behavior, understanding complex social systems requires studying not only the movement of individuals, but also their interactions. Sensing social interactions on a large scale is a technical challenge and many commonly used approaches---including RFID badges or Bluetooth scanning---offer only limited scalability. Here we show that it is possible, in a scalable and robust way, to accurately infer person-to-person physical proximity from the lists of WiFi access points measured by smartphones carried by the two individuals. Based on a longitudinal dataset of approximately 800 participants with ground-truth interactions collected over a year, we show that our model performs better than the current state-of-the-art. Our results demonstrate the value of WiFi signals in social sensing as well as potential threats to privacy that they imply

arXiv.org e-Print Archive

Online Research Database In Technology

On the number of spanning trees in random regular graphs

Author: Greenhill Catherine
Kwan Matthew
Wind David Kofoed
Publication venue
Publication date: 01/01/2014
Field of study

Let

d \geq 3

be a fixed integer. We give an asympotic formula for the expected number of spanning trees in a uniformly random

d

-regular graph with

n

vertices. (The asymptotics are as

n\to\infty

, restricted to even

n

d

is odd.) We also obtain the asymptotic distribution of the number of spanning trees in a uniformly random cubic graph, and conjecture that the corresponding result holds for arbitrary (fixed)

d

. Numerical evidence is presented which supports our conjecture.Comment: 26 pages, 1 figure. To appear in the Electronic Journal of Combinatorics. This version addresses referee's comment

arXiv.org e-Print Archive

CiteSeerX

Online Research Database In Technology

Inferring Stop-Locations from WiFi

Author: Furman Magdalena Anna
Jørgensen Sune Lehmann
Sapiezynski Piotr
Wind David Kofoed
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2016
Field of study

Human mobility patterns are inherently complex. In terms of understanding these patterns, the process of converting raw data into series of stop-locations and transitions is an important first step which greatly reduces the volume of data, thus simplifying the subsequent analyses. Previous research into the mobility of individuals has focused on inferring 'stop locations' (places of stationarity) from GPS or CDR data, or on detection of state (static/active). In this paper we bridge the gap between the two approaches: we introduce methods for detecting both mobility state and stop-locations. In addition, our methods are based exclusively on WiFi data. We study two months of WiFi data collected every two minutes by a smartphone, and infer stop-locations in the form of labelled time-intervals. For this purpose, we investigate two algorithms, both of which scale to large datasets: a greedy approach to select the most important routers and one which uses a density-based clustering algorithm to detect router fingerprints. We validate our results using participants' GPS data as well as ground truth data collected during a two month period

Directory of Open Access Journals

PubMed Central

Online Research Database In Technology

String Matching with Variable Length Gaps

Author: Aho
Crochemore
David Kofoed Wind
Fredriksson
Hjalte Wedel Vildhøj
Hofmann
Inge Li Gørtz
Knuth
Morgante
Myers
Myers
Myers
Navarro
Navarro
Philip Bille
Thompson
Publication venue
Publication date: 01/01/2010
Field of study

We consider string matching with variable length gaps. Given a string

T

and a pattern

P

consisting of strings separated by variable length gaps (arbitrary strings of length in a specified range), the problem is to find all ending positions of substrings in

T

that match

P

. This problem is a basic primitive in computational biology applications. Let

m

and

n

be the lengths of

P

and

T

, respectively, and let

k

be the number of strings in

P

. We present a new algorithm achieving time

O(n\log k + m +\alpha)

and space

O(m + A)

, where

A

is the sum of the lower bounds of the lengths of the gaps in

P

and

\alpha

is the total number of occurrences of the strings in

P

within

T

. Compared to the previous results this bound essentially achieves the best known time and space complexities simultaneously. Consequently, our algorithm obtains the best known bounds for almost all combinations of

m

n

k

A

, and

\alpha

. Our algorithm is surprisingly simple and straightforward to implement. We also present algorithms for finding and encoding the positions of all strings in

P

for every match of the pattern.Comment: draft of full version, extended abstract at SPIRE 201

arXiv.org e-Print Archive

CiteSeerX

Elsevier - Publisher Connector

Crossref

Online Research Database In Technology

Optimisation of Car Park Designs

Author: Billingham John
Bradshaw Joel
Bulkowski Marcin
Dawson Peter
Garbacz Pawel
Gilbert Mark
Gosce Lara
Hjort Poul
Homer Martin
Jeffrey Mike
Papavassiliou Dario
Porter Richard
Wind David Kofoed
Publication venue
Publication date
Field of study

The problem presented by ARUP to the UK Study Group 2014 was to investigate methods for maximising the number of car parking spaces that can be placed within a car park. This is particularly important for basement car parks in residential apartment blocks or offices where parking spaces command a high value. Currently the job of allocating spaces is done manually and is very time intensive. The Study Group working on this problem split into teams examining different aspects of the car park design process There were three approaches taken. These approaches include a so-called "tile-and-trim" method in which an optimal layout of cars from an `infinite car park' are overlaid onto the actual car park domain; adjustments are then made to accommodate access from one lane to the next. A second approach seeks to develop an algorithm for optimising the road within a car park on the assumption that car parking spaces should fill the space and that any space needs to be adjacent to the network. A third similar approach focused on schemes for assessing the potential capacity of a small selection of specified road networks within the car park to assist the architect in selecting the optimal road network(s). The problem is a variant of the "bin packing" problem, well known in computer science. It is further complicated by the fact that two different classes of item need to be packed (roads and cars), with both local (immediate access to a road) and global (connectivity of the road network) constraints. Bin-packing is known to be NP-hard, and hence the problem at hand has at least this level of computational complexity. None of the approaches produced a complete solution to the problem posed. Indeed, it was quickly determined by the group that this was a very hard problem (a view reinforced by the many different possible approaches considered) requiring far longer than a week to really make significant progress. All approaches rely to differing degrees on optimisation algorithms which are inherently unreliable unless designed specifically for the intended purpose. It is also not clear whether a relatively simple automated computer algorithm will be able to "beat the eye of the architect"; additional sophistication may be required due to subtle constraints. Apart from determining that the problem is hard, positive outcomes have included: Determining that parking perpendicular to the road in long aisles provides the most efficient packing of cars. Provision of code which "tiles and trims" from an infinite car park onto the given car park with interactive feedback on the number of cars in the packing. Provision of code for optimal packing in a parallel-walled car park. Methods for optimising a road within a given domain based on developing cost functions ensuring that cars fill the car park and have access to the road. Provision of code for optimising a single road in a given (square) space. Description of methods for assessing the capacity of a car park for a set of given road network in order to select optimal road networks. Some ideas for developing possible solutions further

RUNTIME DICTIONARIES FOR ROOT

Author: Wind David Kofoed
Publication venue
Publication date: 06/09/2013
Field of study

ROOT is the LHC physicists' common tool for data analysis; almost all data is stored using ROOT's I/O system. This system benefits from a custom description of types (a so-called dictionary) that is optimised for the I/O. Until now, the dictionary cannot be provided at run-time; it needs to be prepared in a separate prerequisite step. This project will move the generation of the dictionary to run-time, making use of ROOT 6's new just-in-time compiler. It allows a more dynamic and natural access to ROOT's I/O features especially for user code

CERN Document Server

Statistical Models for Wifi Data and Educational Peer Review

Author: Wind David Kofoed
Publication venue: DTU Compute
Publication date: 01/01/2018
Field of study

Online Research Database In Technology